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1. INTRODUCTION 

The pattern of weather changes in an area is determined by several variables, including the location 
and geographic shape of the region. Indonesia is a tropical country with a diverse climate and is strongly 
influenced by rainfall patterns [1]. The high rainfall in the tropics is caused by several interrelated factors, 
such as the inter-tropical convergence zone (ITCZ), Indonesian throughflow (ITF), EI-Nifio Southern 
Oscillation (ENSO), Pacific decadal oscillation (PDO), Indian ocean dipole mode (IODM), and madden 
julian oscillation (MJO) [2]. Moreover, tropical cyclones, storms characterized by low atmospheric pressure, 
strong winds, are also known to cause extreme rain events in Indonesia [3]. All regions in Indonesia are 
predicted to experience changes in rainfall patterns [4]. In addition, changing trends can show significant 
variations for monthly, seasonal, and even annual timescales [5]. This variation makes the weather in 
Indonesia fluctuating and difficult to predict. Extreme weather changes will cause losses in various aspects of 
life. Therefore, weather forecasting technology is urgently needed and continues to be developed with an 
increasingly high level of accuracy. 

In [6]-[8] has developed a weather forecasting system for long-term prediction by utilizing artificial 
intelligence. Some institutions and companies even have their weather stations to collect weather data and 
predict upcoming weather events. This data is used for their purposes, and in some cases, shared between 
third parties. In this case, the weather station used is professional and expensive, so it is not affordable for 
small companies [9]. 
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Some fields require short-term forecasts for specific local areas, such as agriculture, construction, air 
travel and transportation, and solar power generation. The importance of these predictions is increasing, 
especially in the industrial sector. Many authors have developed short-term weather forecasting systems, 
mainly using artificial intelligence and data mining approaches to make predictions [10]-[12]. Meanwhile, 
some authors also use internet of things (IoT) technology and Raspberry Pi to create monitoring systems to 
gain information about environmental conditions on a more local level [13]-[15]. The systems could monitor 
surrounding weather conditions such as humidity, temperature, soil moisture, rainfall, and light intensity. The 
association between the rain-determining parameters can be used as an initial analysis in this study. 

To overcome this problem, the authors’ in 2020 built a prototype of a low-cost mini weather station 
based on artificial intelligence for temporal and spatial data collection, which is used for short-term 
predictions [16]. As a further step from previous research, this study aims to determine the association 
between temperature and humidity with wind direction in Gowa Regency, Indonesia, because wind direction 
is one of the significant indicators in weather prediction. Weather data is obtained from low-cost weather 
stations that utilize the IoT technology. This research begins a temporal-based association system 
development for temperature, humidity, and wind direction parameters. The association of these parameters 
can be used as a reference for the intensity of the movement of the wind direction to temperature and 
humidity in Gowa Regency, South Sulawesi. This research will continue to be developed by adding other 
parameters that affect short-term weather forecasts. 

Two popular association algorithms are used to determine the association between temperature, 
humidity, and wind direction parameters, namely the apriori algorithm and FG-growth. Both algorithms have 
been applied in various fields, such as disaster management [17], medical [18], industry [19], education [20], 
[21], and weather prediction [22]. The advantage of the apriori algorithm is its capability to significantly 
reduce the itemset's size in the database as it focuses on generating and discovering the most frequent 
itemsets. It is also easy to implement and study because the data structure is straightforward. Apriori's 
limitation has required a scan of the database at every level of the breadth-first search to create the candidate 
itemsets and calculate the corresponding support count. Apriori will be inefficient when memory capacity is 
limited with many transactions. It also necessitates the user's setting to the minimum support count and 
minimum confidence. Due to the uncertainty of the underlying associate rules of the original data, the two 
values above set by the users may not be proper [23], [24]. Based on these problems, FP-Growth was also 
developed to find the association relationship of weather parameter data. Thus, the performance of the two 
algorithms will be compared to find the optimal solution. After the association stage, as the additional 
contribution of the research, identification, and analysis of time patterns were carried out using the temporal 
association rule (TAR). The structure of this paper is as. In section 2, the research method and methodology 
stages are described. In section 3, the results are discussed. And finally, section 4 concludes the paper and 
discusses future work. 


2. RESEARCH METHOD 

The weather station [16] is built through a Raspberry Pi circuit and long range (LoRa) hat and 
gateway. Meteorological sensors are interconnected with the LoRa shield as nodes. A Raspberry Pi Type B 
model is embedded in the LoRa Hat board to facilitate long-range data transmission. Nodes Censors 
comprises DHT22 for temperature and humidity measurement, anemometer to calculate the wind speed with 
wind vane sensors to identify the wind direction. An Arduino Mega Microcontroller 2,560 is used to read and 
process the data from node censors. IP channel is previously determined to allow LoRa Fat to find the data. 
Upon retrieval into the Raspberry Pi, the data is then added to the local database with a timestamp. The 
weather station can operate approximately 5-10 meters with the height of installation 20 meters above the 
ground, which can be seen in Figure 1. The association algorithm's implementation in this study is used to 
see the effect of patterns based on the temporal analysis. The research stages in Figure 1 consist of data 
collecting from the low-cost weather stations and data analysis with association algorithms. Winds bring 
humidity into the atmosphere and hot or cold air into the climate, affecting weather patterns. Hence, a change 
in wind results in a change of weather. Identifying the direction of the wind provides essential insight into the 
type of temperature to expect. Temperature and humidity are essential parameters in looking at the 
relationship between weather patterns. 


2.1. Data collection 

This research used data from a low-cost weather station that has been built in [19]. The input data 
were obtained from the weather station's sensor readings for temperature, humidity, and wind direction. Data 
collection on weather stations is taken every 5 minutes during December 2019 in Gowa Regency, South 
Sulawesi Indonesia. 
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2.2. Data preprocessing 

The preprocessing stages consist of data selection, interpolation, normalization, and categorization. 
This stage selects relevant data to be processed into the system so that the data do not contain noise and do 
not have a NULL value (missing value). There are 1,261 selected data from 1,440 raw data that will be 
interpolated to eliminate missing data from the built weather station. The following process is normalization 
to scale the attribute values of data in a specific range. In this study, the min-max normalization method was 
used. Before implementing the Association Rules algorithm, the data is categorized in the weather 
categorization stage based on conditions in the Gowa region. The normalized temperature, humidity, and 
wind direction data will be categorized into different intervals as shown in Tables 1 to 3, respectively. 


DATA COLLECTION 


f TEMPERATURE 


PREPROCESSING DATA 


DATA ANALYSIS WITH 
ASSOCIATION ALGORITHM 


Generate Association Rule Using Apriori 
And FP-Growth Using Algorithms 


Analyze Time Pattern Rule Using 
TAR Algorithm 


RESULT OF PARAMETERS ASSOCIATION 


ELECTRICAL 


Figure |. The real-time weather station 


Table 1. Categorization of temperature data 
Rating Value (°C) 


Higher 335 
Moderate 28-34 

Lower 21-27 

Lowest 14-20 


Table 2. Categorization of relative humidity 


Rating Value (%) 
Humid >80 
Less Humid 50-79 
Less Dry 0-49 


Table 3. Categorization of wind direction 


Category Direction (°) 
North 0 
North - East 45 
East 90 
South - East 135 
South 180 
South - West 225 
West 270 
North - West 315 
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2.3. Data analysis with association algorithms 

Association rule mining (ARM) is a prominent and well-explored method for determining relations 
among itemsets in large databases. It was initially introduced for discovering regularities between products in 
large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets [25]. The discovered 
connections between different itemsets are presented in the form of association rules. The basic formula of 
association rules is A—B, which represents the portion of the transaction containing A that also contains B. It 
is the conditional probability of the consequent given the antecedent [21]. The popular ARM algorithms are 
apriori, eclat, dclat, FP-tree growth, H-Mine, FIN, AprioriTID, and RElim [26]. 

Association rule mining works in two parts. First, all frequent itemsets are generated. These are sets 
of items that have at least the given minimum support. In the second step, association rules are generated 
from the frequent itemsets found in the first step [27]. The importance of these rules is usually measured by 
two metrics, support, and confidence. Support is the proportion of transactions in the database in which item 
A appears. Support shows the popularity of an item and can be used to filter out items with low occurrences. 
Confidence signifies the likelihood of item B being purchased when item A is purchased. It is the conditional 
probability of occurrence of the consequent given the antecedent. The user defines the minimum threshold 
value for support and confidence to eliminate less frequent rules [28]. The minimum value identifies how 
often the itemset appears, or what is known as the minimum support threshold (MST) is set at 0.5. 
Meanwhile, the minimum confidence threshold (MCT) states that the minimum value of confidence from the 
resulting relationship items is set at 0.7. 


Num.of transaction in which A appears 


Support(A) = Total numof transaction ” 
: s t(AUB 
Confidence(A > B) = rn (2) 


Besides Support and Confidence, the association rule also has a Lift ratio, which is also a measure to 
determine its significance. The Lift ratio shows the reality of the data association between parameters. Lift > | 
indicates a positive association between the formed itemsets. This value also indicates the validity of an 
association rule as shown in (3). 


. as Conf idence(AUB) 
Lift (A ‘ B) = Support (A)* Support(B) 


(3) 
2.3.1. Apriori algorithm 

Apriori is an algorithm for finding frequent itemsets from a dataset by generating a candidate key 
that uses prior knowledge of recurring itemset properties. It searches for a series of frequent sets of items in 
the datasets and builds associations and correlations between the itemsets. The apriori algorithm is described 
as [29]. 


Algorithm apriori 

Cx: Candidate itemset of k 

Ixy: frequent itemset of size k 

Li: {frequent itemset}; 

for (k=1: Lx != 0; k++) do begin 

Cx+1 = Candidates generated from Ix; 

for each transaction t in database dof{ 
increment the count of all candidates in Ck that are contained in t 
Inver = Candidates in Cy: with min_support 
} 

End 

Return UxLx,; 


2.3.2. Frequent pattern (FP) growth algorithm 

The FP-Tree works by overlapping itemsets that share the same prefix path, making the data set's 
information highly compressed. It scans the database only twice, and it does not need any candidate 
generation. In the first scan, the algorithm finds the support values for each item. It will discard the 
infrequent itemsets and sort the itemsets in decreasing order based on their supports. Then, it creates the FP- 
tree in the second scan. The tree's root is labeled with null, while the tree nodes represent each transaction's 
items, which are marked with the item's name and a counter. The nodes are linked one after the other in the 
order of their support values. Different transactions can have overlapped paths when they share items (when 
they have the same prefix). In this case, the counters are incremented. Otherwise, it will create new nodes 
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when there are no overlapped items. Finally, frequent itemsets are extracted from the FP-Tree by reading it 
from the last node to root (bottom-up method) [30]. 


2.3.3. Temporal association rules 

Based on Figure 1, after the preprocessing is done, the next step is to implement the data into the 
apriori association rule algorithm and FP-growth. The two classical methods in the association rule can only 
identify some patterns that occur frequently. Data from a weather station is temporal data. Using this method 
is not enough. A priori or FP-growth with the temporal association rule (TAR) analysis is expected to provide 
information on the relationship between weather data variables accompanied by time scale parameters. The 
frequent itemset obtained from the algorithm will generate rules that meet the specified minimum support 
and confidence. The implementation of TAR into these two methods also uses these parameters only with the 
time dimension's involvement so that the formula indicates the support and confidence obtained from a priori 
and FP-growth. 


3. RESULTS AND DISCUSSION 

Based on the implementation of the two algorithms, five rules are obtained, according to Table 4. The 
formation of association rules of the two algorithms used involves the values of support and confidence. The 
results shown in Table 4 are the rules with the highest confidence value from combinations of itemsets that 
often occur in weather station data. Apart from confidence, the lift is also used to see the validity of the 
resulting rules. This lift ratio shows the reality of the data relationship between parameters. Lift >1 indicates 
a positive association relationship between the formed itemsets. For Rule 1, the same confidence value is 
obtained from testing with apriori and FP-Growth. However, the resulting lift value is different even though 
it shows a positive association between the Humid, LowerTemp, and NorthWest wind direction categories. 
The temporal feature was obtained 16 hours in two different time ranges, which are 00.00-08.00 and 
15.00-23.00. If observed from the lift ratio generated in Table 4, the data's duration with the most number 
gives a small lift value. The period for the frequency of the data itemset does not occur on the same day. In 
contrast, rule 5 gives high lift values of 17.27 for FP-growth and 2.45 for apriori. The pattern of data 
frequency in the 11.00-15.00 time period is consistently obtained every day with a duration of 4 hours. 


Table 4. Rules generation by the apriori and FP-Growth algorithm 


No Generate Rule Apriori FP-Growth Time Period 
Confidence Lift Confidence __Lift 

1 Humid, 0.98 1.66 0.98 2.78 00.00-08.00 
LowerTemp > NorthWest 15.00-23.00 

2  LessHumid, ModerateTemp > SouthEast 1 2.45 1 12.12 13.00-17.00 
3 Humid, 0.98 1.68 1 8.63 00.00-08.00 
LowerTemp > West 19.00-23.00 

4 LessHumid, ModerateTemp > South 0.98 2.36 1 6.82 11.00-20.00 
5 LessHumid, ModerateTemp > SouthWest 1 2.45 1 17.27 11.00-15.00 


The knowledge representation formed from the five association rules in Table 1 shows that the 
wind direction tends to blow to the west with very high humidity (Humid) level between 50-70% range and 
warm temperatures between 21-27 °C that occurs mostly at night. As for the average conditions during the day, 
the wind direction tends to the South and Southeast with a high humidity level (LessHumid) and moderate 
temperature. This phenomenon is implying for Gowa Regency, South Sulawesi. Figure 2 shows the 
relationship between wind direction and the number of data itemset frequencies. 

Figure 2 shows the northwest wind direction's high probability with lower temperature and humid 
conditions during the data collection period. Although these conditions are often found, other associated 
conditions are formed with the two algorithms used, as shown in Table 4. As is well known, one of the 
conditions that affect temperature and humidity is wind. The duration of air mass movement that occurs due 
to differences in air pressure between two locations can affect wind direction changes. Westerly winds 
generally influence the wind direction in December in Indonesia. 

The existing conditions for the area under review in December provide the same facts as the tested 
association system results. The northwest monsoon has high humidity, moderate temperature and the average 
wind movement tends to dominate from north to south [31]. The association relationship that is formed can 
be a reference for rainfall around the area in terms of both relationships between weather parameters and the 
duration of occurring weather conditions. 
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Figure 2. Interpretation of item relationships and the number of data itemset frequencies 


4. CONCLUSION 

This study uses FP-Growth and apriori association rules to find associations between weather 
parameters in Gowa City. The weather data is obtained from a low-cost weather station. With MST and MCT 
set at 0.5, five association rules are obtained. The results show a positive association between humidity, 
temperature, and wind direction. At night, the wind dominantly blows to the west with a very high humidity 
level between 50-70% and warm temperatures between 21-27 °C. During the day, the wind direction tends to 
the South and Southeast with high humidity and moderate temperature. In the future, this research can be 
developed with more complete weather indicators such as associations of temperature, humidity, wind 
direction, wind speed, and rainfall to get better prediction results. Besides, the prediction process can also be 
optimized with other algorithms that can provide better short-term temporal prediction results. 
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